IDA: A System for Automated Sorting, Indexing, and Classification of Documents

نویسندگان

  • Gerd Maderlechner
  • Thomas Brückner
  • Peter Suda
چکیده

IDA (Intelligent Document Analysis) is a modular software system, which assists to automate paper document entry. IDA consists of the following components: layout analysis, preclassification, OCR interface, fuzzy string matching, text categorization, lexical, syntactical and semantic analysis. The system has been applied to a variety of tasks: Presorting of forms, reports and letters, index extraction for archiving and retrieval, text column analysis in real estate register documents, in-house mail distribution, and classification of business letters by text content. This paper presents an overview of the architecture and applications of the system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Categorization

Text categorization (also known as text classification, or topic spotting) is the task of automatically sorting a set of documents into categories from a predefined set. This task has several applications, including automated indexing of scientific articles according to predefined thesauri of technical terms, filing patents into patent directories, selective dissemination of information to info...

متن کامل

Automatic Analysis and indexing of variable-layout documents

In this paper a methodology for analysis and automatic indexing of imaged documents within an archiving and retrieval system is described. This system, which is being developed within the Esprit project STRETCH (STorage and RETrieval by Content of imaged documents), is based on a new generation Archiving and Retrieval Engine (ARE), which overcomes the bottleneck of document profiling by allevia...

متن کامل

An automated approach to analysis and classification of Crypto-ransomwares’ family

There is no doubt that malicious programs are one of the permanent threats to computer systems. Malicious programs distract the normal process of computer systems to apply their roguish purposes. Meanwhile, there is also a type of malware known as the ransomware that limits victims to access their computer system either by encrypting the victimchr('39')s files or by locking the system. Despite ...

متن کامل

ارتقای کیفیت دسته‌بندی متون با استفاده از کمیته‌ دسته‌بند دو سطحی

Nowadays, the automated text classification has witnessed special importance due to the increasing availability of documents in digital form and ensuing need to organize them. Although this problem is in the Information Retrieval (IR) field, the dominant approach is based on machine learning techniques. Approaches based on classifier committees have shown a better performance than the others. I...

متن کامل

Investigating the Adoption Rate of Students' Mental Model with the Structure of the Learning Management System of the University of Tehran by Card Sorting Method

Background and Aim: E-learning is an important topic  in the educational settings and students are  significant prerequisites of it,  who have an essential role for the acceptance and effective use of e-learning management systems so that knowing their attitudes and mental models is essential for the successful implementation of such a method. Therefore, the aim of this study was to investigate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996